knitr::opts_knit$set(root.dir = here::here())
In this file you will find the code we used to generate figures and tables for the paper A robust transcriptomic profile for Chronic Obstructive Pulmonary Disease through a meta-analysis. All code used in this analysis can be found in our Github repository.
For the analysis we selected the datasets by following these steps:
knitr::include_graphics(here::here("paper/fig/DataSelection.png"))
First, we serached datasets in Gene Eexpression Omnibus (GEO) from NCBI by using these terms: COPD, emphesema, chronic bronchitis, homo sapiens, GSE, microarray, RNAseq, NOT single cell on 17th March, 2021. For reproducibility purpose we provide the query used (added in supplementary table GEO_search):
Query: ((“chronic obstructive pulmonary disease” OR “COPD” OR “Emphysema” OR “chronic bronchitis”) AND (“Expression profiling by high throughput sequencing”[Filter] OR “Expression profiling by array”[Filter]) AND “Homo sapiens”[orgn] AND “gse”[Filter] NOT “single cell”
We downloaded the results and stored in /GEO_data/2021-03-17gds_result.txt
tail GEO_data/2021-03-17gds_result.txt
## FTP download: GEO (CEL) ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSEnnn/GSE994/
## Series Accession: GSE994 ID: 200000994
##
## 168. Chronic obstructive pulmonary disease
## (Submitter supplied) Diaphragm muscles in Chronic Obstructive Pulmonary Disease (COPD) patients undergo an adaptive fast to slow transformation that includes cellular adaptations. This project studies the signaling mechanisms responsible for this transformation. Keywords: other
## Organism: Homo sapiens
## Type: Expression profiling by array
## Dataset: GDS289 Platform: GPL96 7 Samples
## FTP download: GEO (CEL) ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSEnnn/GSE475/
## Series Accession: GSE475 ID: 200000475
Then, we parsed the information to have it in a table format by using /GEO_data/parsegeo.pl
cd GEO_data/
perl parsegeo.pl 2021-03-17gds_result.txt
## 168 entries found
## 168 unique IDs found
The parsed table is in /GEO_data/021-03-17gds_result_tab.txt and added in Supplementary tables `
df <- read.delim(here::here("GEO_data/2021-03-17gds_result_tab.txt"), sep = "\t")
kable(df[1:2,])
| No | Description | Extended.Description | Organism | Type.Source | No..Datasets | No..Series | No..Rel..Platforms | No..Samples | Platform | Series | Dataset | GEO.SRA | FTP | Type | Accession | ID |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | A transcriptomic atlas of mouse cerebellar cortex reveals novel cell types | (Submitter supplied) The cerebellar cortex is a well-studied brain structure with diverse roles in motor learning, coordination, cognition, and autonomic regulation. Nonetheless, a complete inventory of cerebellar cell types is presently lacking. We used high-throughput transcriptional profiling to profile more than 600,000 nuclei and molecularly define cell types across individual lobules of the adult mouse cerebellum. more.. | Homo sapiens; Mus musculus | Expression profiling by high throughput sequencing | NA | NA | NA | NA | NA | NA | NA | NA | ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE165nnn/GSE165371/ | Series | GSE165371 | 200165371 |
| 2 | Human Ex Vivo Lung Perfusion: A Model System to study Lung diseases and direct targeted therapies | (Submitter supplied) In the present study, we demonstrate the feasibility of the EVLP technique to perfuse lungs from deceased patients with a broad spectrum of chronic lung diseases. We were able to assess gene expression changes over the course of EVLP and between end-stage diseases by RNA sequencing transcriptomics from tissue samples collected at different time points, revealing fundamental genetic information which has important avenues for modulation in the future. more.. | Homo sapiens | Expression profiling by high throughput sequencing | NA | NA | NA | 26 | GPL18573 | NA | NA | CSV | ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE146nnn/GSE146436/ | Series | GSE146436 | 200146436 |
We also added columns to help manual curation (the columns added were SampleType and Disease). The table is provided in Supplementary Tables Parsed_table.
type <- "LUNG TISSUE|BLOOD|MACROPHAGE.|BALF|SPUTUM|CELL|EPITHELIUM|MUSCLE|CULTURE|FIBROBLAST"
df$SampleType <- str_extract(str_to_upper(df$Extended.Description),type)
table(df$SampleType)
##
## BLOOD CELL CULTURE EPITHELIUM FIBROBLAST LUNG TISSUE
## 9 43 1 28 1 18
## MACROPHAGE MACROPHAGES MUSCLE SPUTUM
## 1 11 7 6
disease <- "CHRONIC OBSTRUCTIVE PULMONARY DISEASE|COPD|CANCER|ASTHMA"
df$Disease <-str_extract(str_to_upper(df$Extended.Description),disease) %>%
str_replace("CHRONIC OBSTRUCTIVE PULMONARY DISEASE", "COPD")
table(df$Disease)
##
## ASTHMA CANCER COPD
## 8 15 87
#write_delim(df,"GEO_data/2021-03-17_Parsed_table.txt", delim = "\t")
Then, we manually curated all datasets to define sample type (e.i. Blood, lung tissue, sputum, cell lines, etc). This table is provided in Supplementary Tables as Manual_curation. Then, we selected “BLOOD” and “LUNG_TISSUE” datasets for a deeper manual curation.
knitr::include_graphics(here::here("paper/fig/PRISMA.png"))
Data normalized is shown in figure bellow.
library(oligo)
library(affy)
library(SummarizedExperiment)
norm_plots <- function(name,x,plot, mfrow = c(2, 3),...){
pdf(here::here(name),height = 6,width = 10)
par(mfrow = mfrow ,mar=c(2,1,1,1))
for (i in 1:length(x)){
if(plot == "hist" & class(x[[i]]) =="ExpressionSet"){
print(paste0(gsub(".GSE.*","",names(x)[i]), " ExpressionSet"))
hist(x[[i]], main = gsub(".GSE.*","",names(x)[i]))
}
if(plot == "boxplot" & class(x[[i]]) =="ExpressionSet"){
print(paste0(gsub(".GSE.*","",names(x)[i]), " ExpressionSet" ))
boxplot(x[[i]], main = gsub(".GSE.*","",names(x)[i]))
}
if(plot == "hist" & class(x[[i]]) !="ExpressionSet"){
print(gsub(".GSE.*","",names(x)[i]))
#hist(assay(lung_norm[[i]]),100)
plotDensity(log2(assay(lung_norm[[i]])+1), main = gsub(".GSE.*","",names(x)[i]))
}
if(plot == "boxplot" & class(x[[i]]) !="ExpressionSet"){
print(gsub(".GSE.*","",names(x)[i]))
#hist(assay(lung_norm[[i]]),100)
boxplot(log2(assay(lung_norm[[i]])+1), main = gsub(".GSE.*","",names(x)[i]))
}
}
dev.off()
}
For blood datasets:
## Read data
blood_norm <- readRDS(here::here("Blood/output_data/Blood_2020-11-25_Step3_LungTissue-CURATED.RDS"))
#Plot images
norm_plots(name = "paper/fig/blood-norm_hist.pdf", x = blood_norm, plot = "hist")
## [1] "GSE112811 ExpressionSet"
## [1] "GSE56766 ExpressionSet"
## [1] "GSE42057 ExpressionSet"
## [1] "GSE94916 ExpressionSet"
## [1] "GSE76705 ExpressionSet"
## quartz_off_screen
## 2
norm_plots(name = "paper/fig/blood-norm_boxplot.pdf", x = blood_norm, plot = "boxplot")
## [1] "GSE112811 ExpressionSet"
## [1] "GSE56766 ExpressionSet"
## [1] "GSE42057 ExpressionSet"
## [1] "GSE94916 ExpressionSet"
## [1] "GSE76705 ExpressionSet"
## quartz_off_screen
## 2
knitr::include_graphics(here::here("paper/fig/blood-norm_hist.pdf"))
knitr::include_graphics(here::here("paper/fig/blood-norm_boxplot.pdf"))
For lung datasets:
## Read data
lung_norm <- readRDS(here::here("Tissue/output_data/2021-03-30_Step3_LungTissue-CURATED.RDS"))
lung_norm <- lapply(lung_norm, function(x){x[[1]]})
#Plot images
norm_plots(name = "paper/fig/lung-norm_hist.pdf", x = lung_norm, plot = "hist", mfrow = c(4, 3))
## [1] "GSE27597 ExpressionSet"
## [1] "GSE106986 ExpressionSet"
## [1] "GSE37768 ExpressionSet"
## [1] "GSE57148"
## [1] "GSE47460 ExpressionSet"
## [1] "GSE38974 ExpressionSet"
## [1] "GSE8581 ExpressionSet"
## [1] "GSE1650 ExpressionSet"
## [1] "GSE1122 ExpressionSet"
## [1] "GSE119040 ExpressionSet"
## [1] "GSE103174 ExpressionSet"
## [1] "GSE124180"
## quartz_off_screen
## 2
norm_plots(name = "paper/fig/lung-norm_boxplot.pdf", x = lung_norm, plot = "boxplot", mfrow = c(4, 3))
## [1] "GSE27597 ExpressionSet"
## [1] "GSE106986 ExpressionSet"
## [1] "GSE37768 ExpressionSet"
## [1] "GSE57148"
## [1] "GSE47460 ExpressionSet"
## [1] "GSE38974 ExpressionSet"
## [1] "GSE8581 ExpressionSet"
## [1] "GSE1650 ExpressionSet"
## [1] "GSE1122 ExpressionSet"
## [1] "GSE119040 ExpressionSet"
## [1] "GSE103174 ExpressionSet"
## [1] "GSE124180"
## quartz_off_screen
## 2
knitr::include_graphics(here::here("paper/fig/lung-norm_hist.pdf"))
knitr::include_graphics(here::here("paper/fig/lung-norm_boxplot.pdf"))
source(here::here("Lung-Blood/scripts/sample_size.R"))
knitr::include_graphics(here::here("Lung-Blood/fig/sample_size.pdf"))